Random walks in the space of conformations of toy proteins 
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Monte Carlo dynamics of the lattice 48 monomers toy protein is interpreted as a random walk in an abstract 
(discrete) space of conformations. To test the geometry of this space, we examine the return probability P{T), 
which is the probability to find the polymer in the native state after T Monte Carlo steps, provided that it starts 
from the native state at the initial moment. Comparing computational data with the theoretical expressions for 
P(r) for random walks in a variety of different spaces, we show that conformational spaces of polymer loops may 
have non-trivial dimensions and exhibit negative curvature characteristic of Lobachevskii (hyperbolic) geometry. 
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Levinthal's paradox is universally considered to be 
the essence of the protein folding problem. In its most 
direct form, the paradox revolves around the exponen- 
tially large number of possible conformations being im- 
measurably larger than what a protein can conceivably 
test within the observable time scale. Levinthal's para- 
dox arises from thinking of protein folding as a search 
in a conformational space resembling a golf course with 
just one hole representing the native state. To resolve the 
problem, it has been conjectured in the literature ||-||] 
that volume interactions between monomers should pro- 
vide an energetic bias towards the native state. However 
undoubtedly correct, this should not overshadow the ne- 
cessity to understand the search in conformational space. 
Indeed, if we start from an open coil conformation, then 
volume interactions cannot provide any significant bias 
for a while, at least until after some minimal number of 
contacts has been formed. In macroscopic terms, this ini- 
tial stage is an uphill climb over an cntropic barrier. In 
microscopic terms, it is a random walk in conformation 
space. In order to initiate the energy-driven downhill 
slide towards the native state, or to enter the funnel- 
shaped area of the free energy landscape 1^ , a fluc- 
tuation has to provide a sufficient decrease in entropy, or, 
in other words, a random walk has to bring the system 
into the specific region in conformation space. 

This way of thinking implies the following resolution of 
Levinthal's paradox: there is no need for a random unbi- 
ased search to detect a single native state, it only needs to 
bring the system into some region, w, in the conforma- 
tion space. Physically, u corresponds to the transition 
(macro)state, most likely to a critical nucleus of some 
kind Therefore, the volume and shape of uj in 

conformation space are dictated by both sequence spe- 
cific energy factors and sequence independent properties 
of conformations. The system searches for this critical 
(macro)state u through a random walk in conformation 
space, largely unaffected by heteropolymeric interaction 
energies. Understanding this process is the problem of 
normal polymer dynamics 1^ and in some cases it may 
be reduced to the kinetics of a homopolymer collapse 
pO|. Unfortunately, this problem is rather difficult and 



remains out of reach of current simulation techniques. 

In order to pave the way to it, we will consider in this 
work another related problem of random walks in con- 
formation space, namely that of fluctuations around the 
native state. This is itself a pressing issue in protein 
folding theory. Indeed, understanding these fluctuations 
is necessary in order to address the corrections to mean 
field theory which is formulated in terms of the Random 
Energy Model (see and the review jl2j with a multi- 
tude of references therein). 
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FIG. 1. The 48-mer conformation. The dark contacts indi- 
cate critical folding nucleus for this particular conformation, 
according to the data of the work by Mirny et al The 
loops considered in the present work are shown as thick lines. 



We will restrict ourselves with the standard lattice pro- 
tein model of 48 monomers. In particular, we choose to 
work with the particular native state conformation ad- 
dressed in [nSl. In order to remain in the vicinity of 
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the native state, we can permanently fix the contacts 
which form the nucleus. The nucleus conformation, as 
conjectured in JlSt , is shown in Figure |l|. Finding the 
nucleus even for one given native conformation is a dif- 
ficult and unresolved problem in protein folding. Mirny 
et al. obtained the nucleus for the native state shown 
in Figure |l| by a procedure in which interactions between 
various monomer pairs were mutated. Those interactions 
which are necessary for a stable folded configuration were 
examined for a conserved set. The contacts belonging to 
the conserved set were inferred to be the nuclear contacts 
(see Figure Q). As a matter of fact, the procedure for de- 
termining the nucleus remains a subject of scrutiny and 
heated debate ll9|-[2^] . The various models of a single nu- 
cleus 0], of multiple nuclei js), and of nucleation classes 
psf are being debated but the choice among the different 
models is not the subject of this work. The nucleus from 
the work which is being used in the present work has 
been chosen arbitrarily among a large number of possible 
conformations with nuclei surrounded by loops. 

In order to address purely entropic factors, we shall 
examine folding of the polymer under no interactions 
(except for the constraints of polymer connectivity, ex- 
cluded volume, and fixed contacts). The resulting struc- 
ture is essentially that of many loops with fixed ends in 
the nucleus. For the discrete lattice model, we consider 
the conformational phase space as a graph in which the 
conformations are represented by nodes on the graph. 
If two conformations can be interconverted via a single 
Monte Carlo move (end fiip, corner fiip, or crankshaft), 
their corresponding nodes are connected by edges on the 
graph ll^. A Monte Carlo run is thus equivalent to a 
random walk on that graph. From that point of view, 
folding is equivalent to performing a random walk which 
returns to the origin. Thus, our plan is as follows: we will 
perform a long Monte Carlo run of the above described 
loop model, and we will record all the time moments (or 
Monte Carlo steps) of spontaneous folding, or random 
arrival to the origin, which is the native state. This will 
give us the return probability, P(T), as the function of 
Monte Carlo time, T. In order to interpret these data, 
we will compare them with a summary of the known re- 
sults for the expected behavior of P{T) in a variety of 
different spaces. Before proceeding with this plan, two 
short comments must be made: (a) We are referring to 
the return probability P(T), not the first return proba- 
bility. In other words, the system said to return to the 
origin at time T, no matter how many times it may have 
visited origin previously, (b) There is a potential source 
of terminological confusion due to the well known anal- 
ogy between polymer conformations and trajectories of 
random walks. Indeed, the conformation shown in Fig- 
ure |l| is often described in terms of some walker in 3D 
space. We would like to stress that we are speaking here 
about a completely different random walk. In our case, 
the walker is the entire protein chain, and the walk is 
being performed in the abstract space of conformations. 

We are now ready to begin with a summary of the 



known results for the return probabilities in spaces of 
various geometries. We will consider only discrete spaces, 
assuming every elementary step of the random walk to 
be of unit length. 

• For a random walk of T steps in an unbounded Eu- 
clidean space of dimension d, the probability of return 



P(T) i2TrT/dy'^/^ ^T-'^/'^ 



(1) 



A square (or cubic) lattice is in this sense the discrete 
counterpart of Euclidean space with d = 2 (or d = 3). 

• Equation (|l|) holds for a fractal space with non- 
integer d, in this case d is the spectral dimension p^ ]. 

• It is important for us to consider a random walk in 
a bounded region, because the set of conformations is 
always finite for lattice models, and thus, the region of 
interest in the conformation space is also finite. For a 
random walk in a bounded "cavity" in Euclidean space, 
or on a bounded fractal, 
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Here R is the "size" of the allowed conformation space 
(graph), M is the total number of allowed conformations 
(or graph nodes), zq and z are the numbers of possible 
Monte Carlo moves (or incident graph edges), respec- 
tively, for the native state and averaged over all states. 
Equation (|^) means that a random walk does not feel the 
bounds at "small" times until it arrives at the boundary. 
At later times, it covers all of the available region in a uni- 
form manner, then the probability for visiting each point 
(conformation), a, is simply proportional to the number 
of ways, Za, incident to that point. To better understand 
the meaning of R, it is useful to define the "distance" 
Rap between two conformations, a and /3, as the mini- 
mal number of elementary moves necessary to convert a 
into (3. Then, according to graph theory, the diameter 
of the graph representing our conformation space should 
be defined as the maximum of Ra/s over all pairs of con- 
formations. Our R is then typically on the order of one 
half of this diameter. Note that in the limit of very long 
loops, ^ 1, we expect the space diameter and R to 
scale as R ^ N. 

• Equation (||) represents a particular example of 
switching, or crossing over, from one dimension to the 
other. In other words, the conformation space may ap- 
pear to have one dimension close to the native state and 
another dimension far from the native state. In this 
sense, a bounded space has the dimension d at small 
scales and dimension (like a point) at larger scales. 

• For a random walk in a d-dimensional Lobachevskii 



space 



P(T) - r-'*/2 j;xp (_TA/2d) 



(3) 



where A is the Gaussian curvature (inverse squared curva- 
ture radius) of the space. This formula can be explained 
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in the following simple way. At the scale T 4; 1/A, when 
typical distance from the origin remains smaller than the 
curvature radius, the space appears effectively flat and 
equation (||) reduces to (|l|). On the other hand, for very 
large T, P(T) is dominated by the exponential term, 
which can be understood if one remembers that a Cay- 
ley tree graph is the discrete counterpart of Lobachevskii 
space. It is important to consider Lobachevskii geometry 
because, as we mentioned, the conformation space diam- 
eter scales as N for very long loops in the A'' ^ 1 limit, 
while the number of conformations scales exponentially 
with N. This means that there is an exponential growth 
of the number of conformations as a function of the dis- 
tance from any given conformation (e.g., native). Such 
an exponential growth is the signature of Cayley tree or 
Lobachevskii geometry |T^ ]. 

• The analysis of loop conformations which arise from a 
fixed nucleus of contacts becomes more complicated if we 
have multiple loops. Still, if loops are independent of one 
another, then a simple estimate for the conformational 
space of all the loops combined can be obtained. Consider 
k loops each having a fraction fi of the total number of 
movable monomers. When a monomer is chosen for a 
Monte Carlo move, the probability that it will be from 
loop lis fi. Assume that each loop lives in an unbounded 
Euclidean space so that the probability for loop i to fold 
after U Monte Carlo steps is P^U) = neglecting 
constant factors. The probability for all loops to return 
after time T, P{T), is thus 

PiT)- E s(T-J2t}lf[f!^P,iU)^^. (A) 

ti,...,tfe=0 \ i / i=i 

Using (i) Stirling's approximation, (ii) Pi{t) ~ i"''*/^, 
and (iii) noticing that due to the combinatorial factor, 
the sum is dominated at large T by the term in which 
U = fiT, wc obtain 

k 

- ^ ^l^g^g ^"^di . (5) 

1=1 

For independent loops dimensions simply add to each 
other. 

• In reality, different loops are not independent. To 
some extent they obstruct each other's folding. In gen- 
eral, for obstructing loops, we expect 

k 

dcs <'^di . (6) 

i=i 

With the summary of mathematical results for the re- 
turn probability in different geometries, we can now pro- 
ceed to the computer experiments ||2^ on the loops shown 
in Figure |l|. 



The return probability for loops 1, 2 and 3 is shown in 
Figure ^ as a function of return time. The log-log graphs 
indicate clearly the power law dependence characteristic 
of Euclidean geometry (|l|). The dimensions in loop space 
for loops 1,2, and 3, according to the Figure |[ are di = 
1.74, ^2 = 0, and — 2.64, respectively. 
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FIG. 2. Log-log plot of the return probability as a func- 
tion of return time for loop 1, loop 2, and loop 3. The di- 
mensions in conformation space for loops 1, 2, and 3 are 
d\ — 1.74, d2 = 0, and da = 2.64. The lines shown have 
slopes of —di/2. Inset: Probability of return for loops 1&3, 
and for loops 2&3 combined. The corresponding dimensions 
are dia = 4.40 ~ di + ds and ^23 = 2.52 « d2 + d^. 

To see why the dimension for loop 2 is 0, note that 
there are only two possible positions for loop 2. At any 
given time the probability for loop 2 to be in one position 
or the other is 1/2, which means that the probability of 
return must be P{T) = 1/2 or \nP{T) = —0.69, consis- 
tent with the result shown in Figure The leveling off 
expected according to equation (^) in a bounded space 
is also seen for loop 1. The saturation level (which cor- 
responds to 1/P w exp(3.3) « 27) and saturation time 
(T* « 67) are roughly consistent with both first and sec- 
ond line of the equation (^ given that the number of 
conformations for loop 1 is = 51, and the number of 
allowed moves for the native state is zq = 4. Since loop 
3 is much longer, it will undoubtedly exhibit leveling off, 
but at longer times, which we did not reach in our Monte 
Carlo experiment |^ . 

To see the effect of having multiple loops on the loop 
space dimension, we examine conformations in which 
both loops 1 and 3 are allowed to move and those in 
which loops 2 and 3 are allowed to move. The reason 
for this choice of loops is that loops 1 and 3 (and loops 
2 and 3) are sufficiently far apart on the conformation 
that their interactions are negligible, consistent with our 
simple analytical estimate. The effective combined di- 
mension for loops 1 and 3 is di^, — 4.40 « di + c?3 and 
for loops 2 and 3 is 1^23 = 2.52 « c?2 + da (see Figure |^) 
in agreement with the approximations given in Equation 
(i). 

The conformational space becomes even more inter- 
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esting with longer loops such as loop 4 (see Figure |^), 
which goes from monomer 20 to 33. As Figure ^ indi- 
cates, the behavior of P{T) for this loop is consistent 
with Lobachevskii geometry in which the return proba- 
bility follows a power law at small return times but decays 
exponentially at sufficiently large return times (see equa- 
tion (||)). A least-squares fit of P{T) gives di = 1.5, and 
A = 4.9 X 10~*. Thus, at rather small scales the confor- 
mational space of the loop 4 is a usual fractal graph, while 
at larger scales it branches exponentially like a Cayley 
tree. Physical nature of branchings in the conformation 
space is very simple: when two different pieces of polymer 
are close together, each piece can move either on one or 
on the other side of the second piece, and to switch from 
one side to the other it has to go back, which is precisely 
the description of bifurcation point on the Cayley tree. 
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FIG. 3. Return probability as a function of return time 
for loop 4. Since the return time is typically very long, it is 
difficult to gather statistics. Thus, the data were binned over 
the time intervais of 2 x 10®, which is why the fog-fog pfot 
is not finear at smaff times. The inset shows the same data 
in semi-fog scafe and indicates exponentiaf behavior at fong 
times, which is consistent with Lobachevskii geometry. The 
feast square fit yiefds d — 1.5 and A = 4.9 x 10~*. 

To conclude, simple Monte Carlo techniques are suf- 
ficient for obtaining dimensions of loops. In particular, 
we have shown that there exists a nontrivial geometry 
of the conformational space, with noninteger dimensions 
and with Lobachevskii type curvature. Returning to the 
introduction and the relation between Levinthal's para- 
dox and random walks in conformation space, one can 
ask: how long does it take for a random walk to bring 
the system into the critical region w? The most naive 
estimate of this time, t, would he t ^ \n\/ \uj\, where $7 is 
the entire conformation space, and | . . . | means the num- 
ber of conformations in the domain .... This estimate 
is consistent with the original Levinthal formulation [Q, 
except = 1 there. However, we now know that such 
estimates are only valid in the long time limit of a walk in 
a bounded space (second line of equation (||)). Thus an 
understanding of random walks in conformational space 
is crucial to the understanding of protein folding. 
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